首页> 外文OA文献 >Word Network Topic Model: A Simple but General Solution for Short and Imbalanced Texts
【2h】

Word Network Topic Model: A Simple but General Solution for Short and Imbalanced Texts

机译:Word网络主题模型:一种简单而通用的短语和短语解决方案   不平衡的文本

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The short text has been the prevalent format for information of Internet inrecent decades, especially with the development of online social media, whosemillions of users generate a vast number of short messages everyday. Althoughsophisticated signals delivered by the short text make it a promising sourcefor topic modeling, its extreme sparsity and imbalance brings unprecedentedchallenges to conventional topic models like LDA and its variants. Aiming atpresenting a simple but general solution for topic modeling in short texts, wepresent a word co-occurrence network based model named WNTM to tackle thesparsity and imbalance simultaneously. Different from previous approaches, WNTMmodels the distribution over topics for each word instead of learning topicsfor each document, which successfully enhance the semantic density of dataspace without importing too much time or space complexity. Meanwhile, the richcontextual information preserved in the word-word space also guarantees itssensitivity in identifying rare topics with convincing quality. Furthermore,employing the same Gibbs sampling with LDA makes WNTM easily to be extended tovarious application scenarios. Extensive validations on both short and normaltexts testify the outperformance of WNTM as compared to baseline methods. Andfinally we also demonstrate its potential in precisely discovering newlyemerging topics or unexpected events in Weibo at pretty early stages.
机译:短文本一直是近几十年来Internet信息的普遍格式,尤其是随着在线社交媒体的发展,其成千上万的用户每天都会生成大量的短消息。尽管短文本传递的复杂信号使其成为主题建模的有希望的来源,但其极端的稀疏性和不平衡性给传统主题模型(如LDA及其变体)带来了前所未有的挑战。为了在短文本中提供一种简单但通用的主题建模解决方案,我们提出了一个基于单词共现网络的模型WNTM来同时解决稀疏和不平衡问题。与以前的方法不同,WNTM对每个单词的主题分布进行建模,而不是为每个文档学习主题,这成功地提高了数据空间的语义密度,而又不会花费太多时间或空间。同时,保留在词-词空间中的丰富上下文信息也保证了其在识别具有说服力的稀有主题时的敏感性。此外,将相同的Gibbs采样与LDA配合使用可使WNTM轻松扩展到各种应用场景。与基准方法相比,对简短文本和标准文本的广泛验证证明了WNTM的出色表现。最后,我们还将展示其在相当早的阶段精确发现微博中新出现的话题或突发事件的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号